Web archive derivatives of the GeoCities collection from the Internet
Archive. The derivatives were created with the Archives Unleashed
The geocities-aut-parquet-derivatives.xz file, once extracted, produces a directory for each derivative in the Apache Parquet format, which is a columnar storage format. Similarly, the geocities-aut-csv-derivatives.xz file, produces a directory for each derivative in the CSV format. These derivatives are generally small enough to work with on your local machine, and can be easily converted to Pandas DataFrames. See this notebook for examples.
The geocities-aut-domain-graph.xz, once extracted, a Raw Network file, which can be loaded into Gephi or any other network analysis software of your choice. You will have to use that network program to lay it out yourself.
The xz files can be extracted with the following commands (extracted size is 529G):